Accounting for Multiple Comparisons in Decision Tree Pruning

نویسنده

  • David Jensen
چکیده

Overtting is a widely observed pathology of induction algorithms. For induction algorithms that build decision trees, pruning is a common approach to correct overtting. Most common pruning techniques , do not account for one potentially important factor | multiple comparisons. Multiple comparisons occur whenever an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces systematic overestimates of accuracy. This paper empirically examines the importance of accounting for multiple comparisons when evaluating models. Specically, it examines the eectiveness of one particular pruning method that does account f o r m ultiple comparisons { Bonferroni pruning. Based on experiments with articial and realistic datasets, Bonferroni pruning produces trees that are smaller and at least as accurate as trees pruned using several other common approaches. pruning Abstract Accounting for Multiple Comparisons in Decision Tree Pruning 1 Introduction

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adjusting for Multiple Comparisons in Decision Tree Pruning

Introduction Bonferroni Pruning Pruning is a common technique to avoid over tting in decision trees. Most pruning techniques do not account for one important factor | multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces incorrect inferences about mo...

متن کامل

Evaluation of liquefaction potential based on CPT results using C4.5 decision tree

The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...

متن کامل

Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...

متن کامل

Adjusting for Multiple Testing in Decision Tree Pruning

Over tting is a widely observed pathology of induction algorithms. Over tted models contain unnecessary structure that re ects nothing more than chance variations in the particular data sample used to construct the model. Portions of these models are literally wrong, and can mislead users. Over tted models require more storage space and take longer to execute than their correctlysized counterpa...

متن کامل

Optimally Pruning Decision Tree Ensembles With Feature Cost

We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general it takes any ensembl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000